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1. INTRODUCTION 



This paper establishes a new probability distribution for comparing two sets of random 
variables X = . . . , X m }, y = {Y 1 , . . . , Y n } (Theorem 13.71) . where X U y is a set of 
continuous, independent, and identically distributed (henceforth abbreviated as c.i.i.d) ran- 
dom variables. Our comparison method is inspired by dice comparisons from the board game 
RISK, where two groups of dice are rolled, ranked, and top performers are then pairwise com- 
pared. The essence of our derivation will be similar, taking pairwise comparisons between the 
1st, 2nd, . . ., kth smallest values from each system (k < min{m,rz}). Our distribution then 
determines, for each / between and k, the probability that exactly I of the k comparisons 
favored system y. Thus, our probability density function consists of four integer parameters 
k, I, m, n. Throughout the rest of the paper, we assume < k, 0<l<k<m<n, where 
the last inequality creates no loss in generality. 

An immediate application is in reliability analysis. Using terminology from the field, our 
distribution compares the performance of a fc-out-of-m system to a fc-out-of-n system, when 
all components are c.i.i.d.. Traditionally, fc-out-of-n systems are compared by comparing 
the distributions of the respective kth order statistics (see, e.g., j5] or [7]). To make such a 
comparison, the component distributions must be known. Since our model relies only on the 
number of components m, n, the number of pairwise comparisons k, but not the underlying 
distributions (Lemma 12.11) . this paper should be viewed as a nonparametric alternative to 
current comparison methods. Indeed, one may use this distribution as a performance test 
between two groups of samples, if nothing is known about the distribution of the samples. 
Also, one may quantify the increase in reliability of a fc-out-of-ra system when the number 
of components n increases. Lastly, we discuss an interesting interpretation of our model, in 
terms of random walks on Z. 

2. Reducing to uniformity 

Denote by the k-th order statistic of A 1; . . . ,X n , i.e., X^ is the kth. smallest value 
of {Xi, . . . ,X n }. The following fundamental lemma simply conveys the exchangeability of 
c.i.i.d random variables for order statistics. 

Lemma 2.1. Let X = {Xx, . . . ,X n } be a set of c.i.i.d. random variables, and let X® be 
the i-th order statistic. For 1 < i,j ' < n, Pr[X^ = Xj] = 1/n. 

Proof. For a fixed index i, must be one of Xi, . . . , X n . By continuity and independence 
of Xi, . . . , X n , Pr[Xj = Xk] = over all such pairs. Hence, 

Pr[X» =X l ] + --- + Pr[X« =X n ] = l 

That X 1: . . . ,X n are identically distibuted implies equality between any pair of summands. 
The lemma follows. □ 

Remark 1. The assumption of continuity in Lemma [27T1 is essential. Indeed, Pr[A« = Xj] > 
if point masses were present. 

Remark 2. Under the assumption of continuity, we may drop the case of equality when 
comparing Xj's, since 

Pr (U{X, = Xj}j = 0. 
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Now, let Xx, . . . , X m , Yi, . . . Y n be c.i.i.d. random variables, mapping into a totally ordered 
space (Q, <). We seek a probability distribution that compares the bottom k performers from 
X = {Xi, . . . , X m }, to the bottom k performers from y = {Y±, . . . , Y n }; we do so by inducing 
an order from (Q, <). 

Definition 1. For random variables Xi,X 2 , . . . ,X m , mapping into (Q, <), an (ascending) 
chain of length d is the ordering 

X^ < Xi 2 < ■ ■ ■ < Xi d 

such that, for 1 < j ' < d, Xi- = X^\ Let B m ^ n be the set of ascending chains of length m + n 
from the set {Xl, . . . , X m , Yi, . . . , Y n } and C m . n the set of ascending chains of length m + n 
from the set {X®, . . . , X^ m \ Y«, . . . , Y»}. ' 

Obviously, \B m>n \ = (m + n)! and |C m , n | = ( m + n ) . When X h . . . , X m , Yi, . . . Y n are c.i.i.d., 
an easy Corollary to Lemma 12.11 shows the probability distribution on B m%n must also be 
uniform. As C m>n is the set of equivalence classes in B m ^ n under the action of S m x S n where 
the S m factor permutes Aj's, and S n permutes Y^-'s, the probability distribution on C m ^ n 
must also be uniform. We record this fact in Lemma 12.21 

Lemma 2.2. Let X\, . . . , X m , Y\, . . . , Y n be a sequence c.i.i.d. random variables. Then the 
probability distribution on C m>n is uniform, with 

mini 1 

rX[C) - ; rr /m-\-n\ ' ^ ^ ^m.n 

(m + n)l ( m + n ) 
Definition 2. Fix fixed k, 1 < I < k, denote by <jy the ordering 

x < kl i y 

if and only if | {i | 1 < i < k and X® < Y®} \ = I. 

In other words, if X y, then of the k bottom performers, there are exactly I instances 
when the 2-th bottom performer from system X underperformed the z-th bottom performer 
from system y. 

Example 1. For k — 4, the chain 

< y(l) < y(2) < _y(2) < ^(3) < ^(4) < ^(5) < y(3) < ^(6) < y(4) 

in Cq 4 satisfies 

X (D < yd) 

X (2) > y(2) 

X( 3 ) < y( 3 ) 

X (4) < y(4) 

and thus satisfies < 4 3 y. 

As the above example illustrates, for fixed k, each c G C m) n is canonically associated to 
an I satisfying X <kj y. By Lemma [2.21 our goal of determinining 

Pr(* < w y) 

amounts to counting those chains in C m , n satisfying X <^ 3^- 
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3. Counting 



We say that a lattice path is a path which starts at (0, 0) stepping only in the north (1, 0) 
and east (0, 1) directions. If we let r m n be the set of lattice paths from (0, 0) to (m, n), then 
C m)n is in bijective correspondence with T m n ; this can be seen by constructing a lattice path 
from reading a chain in ascending order, taking a north step when a is encountered and 
an east step for an Conversely, an ascending chain can be constructed from a lattice 

path 7 G r m ra by traversing 7 in the northeast direction, writing X^ 1 ' when the the ith 
horizontal edge is encountered, and writing when the ith. vertical edge is encountered. 
Under this bijection, it is clear that 

\r 1 - if \-( m + n \ 

\^m,n\ M- m,n\ I I • 

\ m J 

Example 2. The chain 

< y(l) < y(2) < jf(2) < ^(3) < ^(4) < ^(5) < y(3) < j^(6) < y(4) 

in 6*6,4 corresponds to the path RUURRRRURU in r 6j4 , depicted in figured! 



Figure 1. RUURRRRURU in T 



By lemma I2.2[ if X\, . . . , X m , K 1; . . . , Y n are c.i.i.d., and T m n is equipped with the uniform 
distribution, then the bijection outlined above is also measure preserving. Now, to see what 
<kj means in r m n , we introduce the notion of exceedance. 

Definition 3. A path 7 e r m n is said to have horizontal exceedance equal to / iff 7 has / 
horizontal edges below the line y = x. Similarly, 7 has vertical exceedance equal to / iff 7 
has / vertical edges above the line y = x. We say 7 has /c-horizontal exceedance equal to / 
iff / of the first k horizontal edges lie below the line y = x, and 7 has k- vertical exceedance 
equal to / iff 7 has exactly / of the first k vertical edges lie above the line y = x. 

The horizontal and vertical exceedance of 7 will henceforth be denoted by HEi^f) and 
VEi^f) respectively. Similarly, denote by HE^) and VEk{^f) the /c-horizontal and k- vertical 
exceedance, respectively. 

Remark 3. When m = n, vertical exceedance / paths are typically called (n, /)-flawed Dyck 
paths in the literature [8]. 
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Example 3. The 7 = RUURRRRURU of example [2] has horizontal exceedance 5 and vertical 
exceedance 1. If we set k = 4, then 7 has 4- horizontal exceedance 3, and 4- vertical exceedance 
1 . The 4-exceedances of 7 are depicted in Figure [2j 




Our next lemma relates our method of comparing chains (def. [2]) to counting paths in 
r m n of a certain exceedance. 

Lemma 3.1. Let X%, . . . , X m , Y±, . . . , Y n be a sequence of c.i.i.d random variables. Then, 
for 0<l<k<m<n, 

Pr({c G C m , n \X < k>l y}) = Pr({ 7 G r m , n \HE k ^) = I}) 
Proof. Fix < k < m < n, and c G C m ,n with corresponding 7 G r m>n . Let I be such that 

x < k ,i y 

is satisfied by c. Then, for exactly / indices between 1 and k, < Equivalently 
stated, there are / instances when is appended to an initial chain that contains at least 
as many I^'s as F^'s. The corresponding statement in T mn , says there are exactly / 
instances where the j-th horizontal edge appears before the j-th vertical edge in the initial 
segment of 7 lying in the k x k box (i.e., the set [0, k] x [0, k}). It is at these indices where a 
horizontal edge is appended to a path whose initial segment lies on or below the line y = x, 
and this happens exactly / times inside the k x k box. Thus HEk{^) = I. □ 

Corollary 3.2. Let X\, . . . , X m , Yi, . . . , Y n be a sequence of c.i.i.d. random variables. Then, 
for fixed k, 

Pr({c G C n>m \X > kjl y}) = Pr({ 7 G T m>n \VE k ^) = I}) 
Proof. Follows by reflecting through the line y = x, and applying Lemma 13.11 □ 

By Lemma 2.1, and the fact that |C TOi „| = |r TO) „|(= ( m ^ n )), finding |{c G C n>m \X > k j y}\ 
is equivalent to counting the number of elements in {7 G Y m ^ n \HE k {pf) = I}. We count 
the latter set by first producing a closed formula for the number of paths with horizontal 
exceedance equal to I that terminate at a lattice point (x, y). We then outline an enumeration 
scheme to count k- horizontal exceedant paths in r m „. 

To carry out the first step, write y, I) for the number of lattice paths n that terminate 
at the point (x,y), satisfying HE{rf) = I. Clearly, #(x,y,l) = if I > x. When x — y, a 

4 



celebrated theorem of Chung and Feller [I] states that lattice paths terminating at (i, i) are 
partitoned evenly among all possible exceedances. In particular, is the zth Catalan 

number Cj. The next lemma simply combines the previous two facts. 

Theorem 3.3 (Chung- Feller). 

[0 otherwise 

A beautiful bijective proof of the Chung- Feller Theorem can be found in [6]. 

The following Lemma and Corollary reduces the task of computing #(x, y, I) to the case 

when y < x. 

Lemma 3.4. For 7 G T XyV and < k < min{x,y}, 

(3.2) HE k { 1 ) + VE k { 1 ) = k. 

Proof. Observe, fc-exceedance of 7 G T x y is completely determined by its initial segment 
having only one vertex, the terminal vertex, lying on the boundary of the k x k box. Such 
an initial segment has a canonical extension to a path in T kjk by appending edges from the 
terminal vertex to (k, k), and this extension does not affect exceedances. So, if HE k {^() — U 
then there is a unique path 7 in that agrees with 7's aforementioned initial segment, 
making HE^) = I as well. Similarly VE^) = VE^). By monotonicity, 7 has 21 edges 
below the line y = x, and each path in T kjk has exactly 2k edges, so 7 has 2{k — I) edges 
above y = x, thus VEk(j) = k — I, and 

HE^) + VE k ^) = HE k tf) + VE k ^f) = k 

The lemma is proved. □ 

Corollary 3.5. For 7 G r^^. 

(3.3) HE(j) + VE(-i) = max{x, y}. 

Proof. By noting that 7 G T XyV extends uniquely to a path in T mai x{x,y},m&x{x, y } without 
affecting exceedances, and setting k = maxjz, y}, the Corollary follows from Lemma I3~4l □ 

By reflecting through the line y = x, Corollary 13.51 yields the identity 

(3.4) #(x,y,l) = #(y,x,max{x,y} -I) 

so we need only to determine #(x,y,l) for the case y < x. To do so, we invoke another 
classical result, due to Bertrand [3], which states there are 

f y + x\ fy + X s 

y ) \y- 1 , 

lattice paths from (0,0) to (x,y), x > y, that never go above the line y = x (so contact at 
y — x is allowed). An easy extension (PQ,[2]) shows that there are 

fx + y-l\ fx + y -1\ _ x-y fx + y^ 

V y J \ y~ l ) %+y\y 

lattice paths from (0,0) to (x, y) that stay strictly below the line y = x, except at (0,0), 
known as the ballot number b(x,y). 



Theorem 3.6. For x > y, If I satisfies x — y < I < x, 

y 

(3.6) #(x,y,l) = ^ Cib(x -i,y -i) 

i=x—l 

and #(x, y, I) = for I otherwise. 

Proof. Fix (x,y) and I. We partition A = {77 £ T xy \HE(ri) = 1} into the (possibly empty) 
sets A , Ax, . . . , A y , where Ai is the set of lattice paths r\ that last cross the line y = x at the 
point [i,i). To ensure A is nonempty, / is at most x, and since a lattice path to (x,y) has 
horizontal exceedance at least x — y, 

#(x, y, I) — 0, if x > I or / < x — y 

Because x > y, the terminal segment of 77 £ Ai, i.e., the segment of i] from (i,i) to (x,y), 
must lie below the line y = x, by the "last cross" property of Ai. Thus, the terminal segment 
of 1] £ Ai will always contribute (x — i) to the horizontal exceedance, mandating a horizontal 
exceedance of I — (x — i) in r/'s initial segment. Thus, to ensure Ai is nonempty, 

/ — (x — i) > ,or % > x — I 

So nonempty Ai run from i = x — I to i = y. We call such indices admissible. To count 
nonempty Ai, we multiply the number of initial segments by the number of terminal segments 
as they were described above. By Lemma I3.3[ the number initial segments terminating at 
{i, i) of horizontal exceedance I — (x — i) is 



a 



i + l\i 



Applying equation I3.5[ the number of terminal (monotonic) paths from (i, i) to (x, y) that 
stay below the line y = x is given by 

,/ ■ -\ x-y (x + y-2i 
o[x — 1, y — 1 



x + y — 2i\ y — i 
making 

\Ai\ = Cib(x -i,y - i). 

Summing over admissible i completes the theorem. □ 

Remark 4. By deleting the terminal edge, one can see equation I3.6I satisfies the necessary 
recursion relation 

#(x, y, I) = #(x - 1, y, I - 1) + y - 1, 1) 
at those points (2, y) with x > y. 

Denote by j^(m,n,k,l) the number of lattice paths terminating at (m,n) having k- 
horizontal exceedance /. We now prove the main theorem. 

Theorem 3.7. 

k-l k-l 

(3.7) 4{m,n,k,l) = ^/({j > l})\B m {l)\ + ^ J({j > k - l})\B {kJ) (l)\ 

j=0 j=0 



where I is the indicator function, 



\B( k j)(l)\ = ( m+ 2_ k k J )Y1 C ^ k -l-i,j-i),0<3<k-l 
\ ' i=k-i 

l%fc)(OI = ( m ~ J ) J2 C ^ k - 1 - *, J - *), < i < A; - 1 

/ m + n _2A;+l\ 

|i^(*-i,fc)(/j| = I n _ fc lo fc _i 
l%, fc -D(0l = ( m + ^_ 2 ^ + 1 )^-i 

Proof. The proof will have a similar flavor to the proof of Theorem 13.61 Fix m, n, k, and /. 
As stated before, ^-horizontal exceedance of 7 G T m>n is determined once its initial segment 
first reaches either of the lines x = k, y = k first, whence we invoke Theorem 13.61 So, we 
partition B = {7 G Y m ^ n \HE^) = 1} into the (possibly empty) sets 

B(0,k), -^(l.fe); • • • 5 ^(fc-l,k)> -B(fe,fc-1), ■ ■ ■ j B(k,l), -B( fc)0 ), 

where Bu^ (resp. -B(fcj)) is the set of 7 G rV m)fl ) with HEk(j) = I, posessing an initial 
segment to first cross the line y = k (resp. x — k) at the point (j, fc) (resp (k,j)). We will 
work first with Bn.j)- Notice, for a lattice path 7 G -Etytj), the initial segment that terminates 
at the vertex (k,j) determines fc-horizontal exceedance. As j < k, this initial segment has 
its last edge below the line y = x and by definition of Br^jji this edge must be horizontal. 
Since this last edge contributes 1 to the horizontal exceedance, — 1, j, I — 1) counts the 
number of such initial segments in B^,j)- This argument holds except at j — k — 1, where 
— l,k — 1,1 — 1) is Cfe_i, by Lemma 13.31 Once fc-exceedance is determined, we may 
append to the initial segment any of the 

m + n — k — j 
m — k 

lattice paths from (k,j) to (m,n), constructing a path in B^,j)- Multiplying, we get, for j 
satisfying k — I < j < k — 1, 



\Bw)\=#(k-l,j,l-l) 



m + n — k — j 
m — k 



m + n — k — f 
m — k 

j 

Cibik, — 1 — i, j — i), and 

i=k— I 



id 1 _ fm + n-2k + 1\ 

Counting paths in Brj^ is slightly different, since the initial segment that determines k- 
horizontal exceedance has its last edge above the line y = x, and this last edge is vertical. 
Since this edge doesn't contribute to horizontal exceedance, k — 1,1) counts the number 
of such initial segments in Bu^y As before, we may append to this initial segment any of 
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the 

/ m + n — k — j 



n — k 

lattice paths from (j,k) to (m,n), to construct a path in B^^y Recall the the reflection 
identity (Eq. E3D 

#(j,k-l,l) = #(k-l,j,k-l-l). 

Thus, for I < j < k — 1, 

\B u>k) \=#(k-l,j,k- 1-1)1^ n _ k 

= \ + n-k J ) i, Cib ( k ~ 1 _ i ' ] ~ and 

It remains to show which of the sets 

B(o,k)i B(i,k), • • • > -B(fc,/c-i), • • • , -S(fc,i)) B(kfl)t 

are nonempty, for our 

If Z = 0, no horizontal lines lie below y = x in the k x k box, so the nonempty sets are 

B(o,k), ■ ■ ■ , -B(fe-i,fc)- 

Summing, we get 

i=o 

If / = fc, all horizontal lines lie below y = x in the k x k box, so the nonempty sets are 

B(k,o), ■ ■ ■ , -B^fc-i). 

Summing, we get 



#(m,n,k,k)= ^5^l 5 (fe,j)lj 



Case < I < k: HE^) = I admits nonemptiness in only the sets 

B(k,k-i), • • • , -B(fc,fe-i), -B(fc-i,fc), • • • , -S(i,fc)- 

Summing, we get 

fe-i jfe-i 
#(m,n,k,l) — ^2 \ B (k,j)\ + ^2\ B (j,k)\ 

j=k-l j=l 

All cases align precisely with formula A3. T|) . as stated. □ 
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Remark 5. Theorem 13.71 provides the probability distribution in question. That is 

^ / n n, k, I) 

(3-8) Pr (X < k>l y) = W{ ' ' ^ 

4. Random Walks 
Lemma [3.11 established a bijection between the sets 

{c G Cm, n \X < k ,i y} ,and {7 G r mjn |#£ fc (7) = /}. 

The mapping 

(1,0) ^ +1 
(0,1) ^ -1 

effortlessly establishes a bijection between the set r m n and the set W m>n of length m + n 
integer walks on Z that start at x = and terminate at x = m — n. As mentioned before, 
by monotonicity in r mfl , any 7 that satisfies HE^'j) = I, has exactly 2Z of its first 2k steps 
lying below the main diagonal. In W m>n , these paths spend exactly 21 of their first 2k steps 
above x — 0. Thus, if we let T 2 ^ : VF( mj „) — >■ {0,2,..., 2/c} be the number of initial 2k steps 
lying above x = 0, we have yet another interpretation of ^ <^ 3^: 

Pr ({c G C m>n \X < Kl y}) = Pr ({w G PV m ,„|T 2fc ( W ) = 2/}) 

5. Future Research 

It is desirable to understand the asymptotics of the distribution in eq. 13.81 Heuristic 
evidence suggests that 

k 

(5.1) 7OTj-#(™> xk ^ x e {°> V^> 2 /^> •••>!} 

is a discretized beta distribution, where the two shape parameters depend, obviously, on 
m, n and k. The nature of this dependency, however, could not be found. A particularly 
intriguing case is a symmetric one, n = 2k, m = 2k, where it appears the distribution in 
eq. 15. II converges, in probability, to an arcsine-like distribution as k — > 00. This would have 
a remarkable implication in how our distribution relates, in terms of random walks, to the 
celebrated arcsine law of Levy. 
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